A Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents

نویسنده

  • N. Shobha Rani
چکیده

Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus Bogadi, Mysore INDIA _____________________________________________________________________________________ Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more than one language. At present optical character recognition technologies are able to recognize and translate only one language, however multi-lingual recognition capabilities for OCR are accomplished through incorporation of script recognizer. This paper eliminates the need of identifying the script type and achieves the automatic recognition of two different scripts with single optical character recognition system, which we are representing as bilingual OCR. Bilingual OCR recognizes the text document images composed of both English and Kannada scripts. The construction of bilingual OCR for English and Kannada is achieved by employing efficient constructs like multiple projection profiles, connected component analysis and principal component analysis. The devised system is proved to be effective and reliable by claiming around 95%-100% accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Script Identification of Text Words from a Tri Lingual Document Using Voting Technique

In a multi script environment, majority of the documents may contain text information printed in more than one script/language forms. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this context, this paper proposes to develop a model to identify and separate text words of Kannada, H...

متن کامل

Script Identification from Trilingual Documents using Profile Based Features

In a multi script environment, majority of the documents may contain text information printed in more than one script/language. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, it is proposed to develop a model to identify the script type of a trilingual document printed i...

متن کامل

Script Identification from Bilingual Gujarati-English Documents

In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...

متن کامل

Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system

India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text w...

متن کامل

Global Approach for Script Identification using Wavelet Packet Based Features

In a multi script environment, an archive of documents having the text regions printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014